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(57)Abstract 

While guiding video contents by displaying two or more key-frames of one or more rows in parallel, 
it is the method of controlling, while mapping of the key-frame which accessed actual access to 
said video contents in this way by alternative access of the displayed key-frame is expressed. The 
1st operational mode that arranges a key-frame in the user interface organization with this single 
method in the order which aligned in time on the screen. It has the feature which can choose 
between the continuation key-frames displayed between the 2nd operational mode that arranges 
the key-frame which has much selectable granularity. 



Claim(s) 

It is a proposal about video contents by displaying two or more key-frames of 1.1 or more rows 
characterized by comprising the following in parallel. While carrying out inner. It is said video by 
alternative access of a displayed key-frame. Mapping of a key-frame which accessed actual access 
to the contents in this way How to control while expressing 

in single user interface organization , it is a screen about a key-frame. The 1st operational 

mode arranged in an order which aligned in time upwards 

A continuation key displayed. Selectable granularity uneven to inter-frame 

2. To key-frame located at center in time in said turn which aligned in time Claim 1 reproducing a 
related audio interval one by one A method of a statement. 

3. a sequence of a key-frame in which a sequential audio interval carried out discrete separation 
audio expression which received and continued mostly is constituted — being according to claim 2 - 

- A method. 

4. 0 relevant to key-frame actually accessed in the 2nd operational mode A method of reproducing 

- DIO interval according to claim 1. 

5. At the same time it carries out highlighting of the key-frame chosen now. This key Many frames 
are expanded to a format of a double size to other key-frames, furthermore ~ detecting the 
harmful video INTA racing effect ~ such an effect — detection ~ Make such an effect reduce by 
vertical decimation In a **** case. A method according to claim 1 by which it is characterized. 

6. At the same time it carries out highlighting of the key-frame chosen now, this key many frames 
are expanded to a format of a double size to other key-frames — performing rise sampling filter 
processing to a picture further, before displaying a picture A method according to claim 1 by which 
it is characterized. 

7. Method of displaying subtitle or other pertinent information which were extracted to related 
sequence of key-frame or key-frame according to claim 1. 

8. Device constituted so that method indicated to claim 1 might be performed. 



Detailed Description of the Invention 

In the method and background this invention of a device invention to which it shows video 
contents, two or more key-frames of one or more rows are displayed in parallel by displaying two 
or more key-frames in parallel. 

Therefore, while guiding video contents, it is related with the method of controlling, while mapping 
of the key-frame which accessed actual access to said video contents in this way by alternative 
access of the displayed key-frame is expressed. 

Using the typical portion of the video presentation recorded for next selective reproduction as a 
key-frame is proposed. The continuation video stream can mean that video continues being "one", 
and animation, a series of still pictures, or the interactive sequence of a picture can be included in 
this stream. The character can be used as various things, for example, a movie, news, or a 
shopping list. 
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This art is indicated by S.W.Smolalr and paper "Content-Based Video Indexing and Retrieval" of 
HJ. Zhang, IEEE Multimedia, Sunnmer 1994, and pp.62-72. 

A key-frame can be taken out from this material with a derivation algorithm in a user's house at 
the time of reception of video material, or for example, by a video provider, it can attach a label to 
a key-frame so that each video shot may begin by a key-frame. The 3rd method makes these 
frames continue mutually by the uniform time interval relevant to standard video speed. This 
invention carries out key-frame use, gives a user the dynamic OBA view of a video presentation 
with useful facility, and is based on recognition that it is necessary to make further access to video 
material, selection of the next display, selection release, or edit into **. 
Generally the memory to a mass medium does not permit immediate access, but the specific 
problem over the present project over digital one of a video image and compression encoding has 
it in the point that the linear recording density which can be especially expressed with the frame 
number per unit memory size is uneven. Adding the secondary memory medium which has a high 
access speed although it is small capacity to a high capacity main memory medium like a tape is 
proposed. In this case, the processing which edits video material into a summarized type, a 
changed type, or a rearrangement form at the next high-speed passing <a thing> on, execution of 
a trick mode like high-speed backward feed , and display sake sees from the both sides of the 
point of a user interface, and the point of storing technology, and produces a remarkable difficulty. 
There are an outline of an invention , therefore the purpose of this invention in giving a user the 
much more natural feeling of memory organization and video material while introducing still higher 
flexibility into user interface organization especially, in the user Interface organization with the 
single feature of this invention for this purpose. It is in the ability to choose between the 1st 
operational mode that arranges a key-frame in the order which aligned in time on the screen, and 
the 2nd operational mode that arranges the key-frame which has two or more uneven selectable 
granularity between the continuation key-frames displayed. If a key-frame is shown In the order 
which aligned in time, when these frames will continue mutually by the uniform time interval 
relevant to standard video speed, for example, high-speed passing <a thing> on and high-speed 
backward feed can be performed easily. An easy, easy change of the layer level which has the 
variable granularity in inter-frame can perform easy, easy access and edit. A key-frame, when 
taking out the part from the start part of a movie shot at least, or also when taking out from the 
related event of the others generated by the original film editor, the same thing can say. Thus, 
clustering processing can be performed automatically. 

While carrying out highlighting of this frame by expanding many key-frames chosen now in a 
double format to other key-frames in this invention method. The harmful video INTARESU effect is 
detected, if that is right, this effect will be reduced by vertical decimation, and rise sampling filter 
processing is performed to a picture before a display again. Although it turns out on experience 
that it can approve, the video distortion of a comparatively small key-frame needs to lecture on an 
additional means for a picture Improvement, when expanding a specific key-frame, this invention 
person confirmed that it brought about the comfortable and useful improvement in image quality 
although this upgrading does not necessarily reach the image quality under a normal condition. 
This invention relates also to the device constituted so that the method mentioned above might be 
enforced. Other features of this invention are indicated to the dependent claim. 
These features, other features, and advantage of a brief explanation of the drawings this invention 
are explained in detail below about a preferred embodiment with reference to drawings. In a 
drawing, drawing 1 is a block diagram of TV-recorder combination device. Drawing 2 shows the 
typical structure of video record, and drawing 3 shows the example of a design of a scroll mosaic 
user interface. Drawing 4 shows the example of a design of a scroll list user interface, drawing 5 
shows the example of a design of a more extensive graphical user interface, drawing 6 shows the 
display of a subtitle, and drawing 7 shows the constitutional diagram of a system action. 
Although it is related to detailed explanation, especially the usual customer of a preferred 
embodiment , and use of a private home, various advantages which are not what is limited to the 
thing related to such use are as follows. 

- A key-frame must be made to show so that the user located in a typical TV observation distance 
can identify these mutually. 

- The number of the key-frames shown simultaneously should presuppose that it is enough to give 
a user the Oba view of the significant portion of the contents of the Digital Video material. 
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- The key-frame should be displayed in the undistorted form by holding an aspect ratio, for 
example. 

- It is preferred to make It the remote control device of a TV set itself operate as a user control 
apparatus. 

- Feedback information must be able to be recognized from typical observation distance. 

- Generally make unnecessary a computer concept like a "drug" and "drops." - Various functions 
are not continuous, and they must be realized so that it may become being used occasionally 
and . 

- The user interface should carry out the table of the linear model good video and known. 
The explanatory view 1 of a specific example is a lineblock diagram showing TV-recorder 
combination device by this invention. The element 20 shows the immediate-control and power 
supply section which TV-set-displays and is related, The element 22 shows the terminal area which 
has an antenna or a signal distribution entity type other like cable distribution. The entity which 
takes out the Digital Video information or a digital signal portion from an input signal can also be 
suitably included in this element. The element 34 shows the video stream of various 
intersubsystems of drawing 1, and the related routing part of information. Routing is controllable 
by the control box 28 with a control signal via the line 35. Although the line 35 is shown as a single 
bidirectional interconnection line, it shall consist of arbitrary numbers of one ways or bidirectional 
lines actually. The control box 28 receives a detecting signal from other subsystems 38 and 40 
while receiving a detecting signal through the line 30 from the display 20, and it controls these 
subsystems. The block 38 is a linear tape recorder which has a very high storage capacity of a 
multi-G byte field. Although the block 40 is a magnetic disk recorder which has a high storage 
capacity and this recorder only has a part of storage capacity of the recorder 38, the access speed 
of this recorder Is farther thian the recorder 38 quick by crossing track jump access. The block 38 
and the block 40 constitute 2 level memory organization similar to a computer memory cache 
system together, and all the items of a video presentation are memorized once at least . The 
element 24 shows the remote control device which communicates with the subsystem 28 and other 
subsystems 38 and 40 Indirectly while communicating via the display device 20 and the wireless 
path 26. 

Drawing 2 shows the typical composition of a video presentation. In order to use video contents as 
a functional target, the bar 60 is a form of a frame or contains the video itself as a string of 
compression video contents like MPEG numerals. Although information is memorized with advance 
of video time along with a bar, it is not necessary to make the actual requirements for memory 
uniform regeneration time .The key-frame with which it is dotted is shown by black vertical stripe 
like 68. Each key-frame is used as the thing showing all the videos in the interval to the following 
key-frame, or a thing representing this, as for a key-frame, a video provider attaches a label ~ or" 

- table OBU contents ~ " (TOC) ~ it can choose as the 1st frame of each new shot by inserting, or 

- again — a receiver ~ a certain algorithm — a video content — FUREMUHE of one frame to the 
next — changing rapidly is detectable. This algorithm assumes this invention to be a right thing. As 
shown in a figure, distribution of a key-frame can be made uneven. Other mechanisms are in the 
interval of regulation of a continuous key-frame, for example, make It continue mutually every 2 to 
3 seconds. This example shows only a key-frame to the indicator 62. A key-frame Is composed a 
little by the layered structure, and the indicator 64 shows only the key-frame of a high level of the 
limited lot. 

This layered structure shall be used as many levels, and the indicator 66 shall be connected only 
with a single the whole video presentation 60 key-frame. Various levels of a key-frame can be 
defined within the organization mentioned above and different organization, and can also be put in 
order. 

While memorizing the main part of a video presentation on the tape recorder 38 in drawing 1, 
memory mapping. It can carry out so that it may play with the disk recorder 40 together with the 
short video just behind a key-frame related if a key-frame is made at least, and/or an audio 
interval. The length of such an interval can be made to be able to respond to the waiting time of 
the linear tape recorder 38, and can attain real-time access. A video presentation can be 
intrinsically made linear like a movie. In other use, other pictures used for a predetermined 
memory interval by animation, a still picture, or the consumer can be included. 
A certain key-frame can oppress this. This combines effectively the time interval in front of the 
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key-frame concerned with a subsequent time interval. An interval is separable again by a reset 
function. Tlie key-frame of various classes, for example, the key-frame of the class mutually 
separated by the fixed time interval, can be oppressed. The key-frame introduced by the provider 
to the key-frame of various different classes, for example, the key-frame generated by a local 
algorithm at the time of reception, can be used for one presentation. 

Drawing 3 shows the example of a design of a scroll mosaic user interface. Every screen departs 
from the upper left and presents 20 key-frames to the lower right. Each key-frame has a number 
of the comprehensive ranking of the key-frame of a graphic display. Actually, highlighting of the 
key-frame 144 is carried out by rectangular control cursor. The user can drive remote control and 
can move freely the button top displayed on the bar displayed on the key-frame top which had 
cursor displayed by the navigation control section of a cursor device, the upper part, and a pars 
basilahs ossis occipitalis. If a user moves control cursor to the left in an upper left corner, a display 
will jump only 20 key-frames back. If it is made to move to the right in a lower right corner, a 
display will jump only 20 frames in front. Access of other portions of the presentation divided into 
the portion of five equal length by accessing the upper bar of a screen is controllable, A black 
horizontal bar shows the full time covered by 20 key-frames displayed on here of all the 
presentations. 

Other functions can be made to start by choosing a specific key-frame first and subsequently 
choosing 1 ** of a pars-basilaris-ossis-occipitalis button. A "view program" (program view) button 
controls the start by the key-frame accessed with cursor. Although the "View segment" (segment 
view) button is the same as that of a view program, only the unisegment ended by the following 
key-frame is reproduced. A "view from x to y" (view from x to y) button controls a start with the 
frame of the earlier one in time of the two key-frames accessed with cursor, and a stop with the 
frame of the later one in time. Other modes are realizable with a key-frame function preselection 
capability. For example. Fast-forward (high-speed passing <a thing> on) or slow-forward (low- 
speed passing <a thing> on) which enables it to check generating of an interval with a specific 
user, Or fast/slow reverse (a high speed / low-speed backward feed) which attains the 
predetermined video effect can be attained. While on display, at the time of the momentary 
passage relevant to a specific key-frame, a video stream is effectively displayed until this key- 
frame becomes active and reaches the instant relevant to the following key-frame. At the time of 
attainment of the following key-frame, this frame turns into an active frame. With such a function, 
a user a VCR by deleting a predetermined segment like commercials as opposed to an interval 
display sequence. It becomes possible to program to a straight forward so that cautions turn to a 
predetermined detail according to low-speed passing <a thing> on. Activate an audio with the 
control button which is not illustrated while on display, It oppresses, or things are made smoothly. 
Or although it continues controlling an audio, video cursor can be made discrete and It can only be 
supposed again that it is it to carry out a step to an interval from an interval with suitable highlight 
directions. 

Drawing 4 shows the typical example of a design of a scroll list user interface. In this example, all 
the screens have five key-frames displayed on the pars basilaris ossis occipitalis, and highlighting 
is carried out by the control cursor of the rectangle which the key-frame 145 moves along with the 
edge. The key-frame 145 is expressed as big magnification also in a background. The position of a 
button is different although this control interface is the same as that of the thing of drawing 3. An 
expansion key-frame can also be oppressed in a multl key frame bar. 
Drawing 5 shows a more extensive graphical user interface. A column on either side play 
(reproduction), stop (stop), select (selection), 1st It is a column of the control button for cut 
(cutoff), paste (stick), fast reverse (high-speed backward feed), zoom+ (zoom +), zoom- (zoom), 
and fast forward (high-speed passing <a thing> on). The line of a pars basilaris ossis occipitalis 
has a sequence of nine key-frames relevant to the scene different, respectively or shot which 
hardly has correlation. By the step moving within the layered structure of a key-frame, dynamic 
good Oba views can be collected from a scene to a scene. Although a key-frame interval can be 
made into 10 seconds, for example, an interval larger than this or small can also be used for it. 
When the interval between the key-frames which continue especially is small, a function like high- 
speed passing <a thing> on is realized. On the other hand, although the interval of the same size 
can be used for full playback of all the audios, it is only jumping video from one key-frame to the 
following key-frame. In this case, the enlarged display of the central key-frame is carried out. If 
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the key-frame of the small interval which has granularity low enough is reproduced, an expansion 
key-frame can be shown dynamically and high-speed passing <a thing> on (or backward feed) 
mode can be performed. In this example, if the material of the key-frame of the next which shows 
a sailing boat is reached, the line of a pars basilaris ossis occipitalis will shift only one position to 
the left, left end "sun" will disappear, and a new key-frame will appear from a right end. The map 
especially of such a display is carried out on a presentation from a background storage, and it can 
be performed by a frame rate quicker than standard video. 

Drawing 6 shows the presentation of the subtitle within the general-purpose format attached and 
stated to drawing 5. . The space 50 in a center region was assigned to the actual frame, and the 
space 52 was taken out from the video presentation. Or it is assigned to the display of the subtitle 
relevant to translation in a language other than the speech text changed, other the pertinent 
information for hearing-impaired persons, for example, object, or the language actually used. It is 
not necessary to take out a subtitle only from the range relevant to seven key-frames of a screen 
pars basilaris ossis occipitalis. Those relevance can be extended more. Each key-frame has other 
associated data placed the time code 54 or on it. The two sequences 56 and 58 of a control button 
are assigned to application operation on left-hand side, and are assigned to the Intra program 
operators on right-hand side. The upper part of a screen has the title 60 of the actual video 
program currently displayed. 

The reason constituted so that it may have a DINA MIKKURI presentation of the video cursor which 
moves with time in the inside of an actually active key-frame field, only by static presentation of a 
key-frame, he makes a video presentation dynamic as a whole, and understands deployment of an 
event good to a user - making - it is because it is insufficient. For this reason, semantics is raised 
as follows, in order to start that a system reduces and displays the Digital Video material including 
a related audio and other effects after the pause of predetermined time, the key-frame cursor 
"includes" becomes "alive." If it reaches during reproduction at the following key-frame, cursor "is 
jumped" automatically to the following key-frame shown in the user interface, and this will 
continue until a user starts an interaction with a system (again). Generally, the organization 
indicated here makes possible the browsing of information that this different from all the video 
strings is another. Even when playing only an audio dynamically with the jump to the arbitrary 
following key-frames from a key-frame, in the specific low requirements for memory, the user can 
acquire the good impression of presentation video. 

In this point, drawing 7 is a constitutional diagram of a system action. A system waits for the input 
from a user in the state 100, displaying many key-frames. Such an input can include the jump 
between the key-frames of a large number currently displayed, the jump to another key-frame of a 
group, and selection of the key-frame which displays a related interval. Such arbitrary inputs drive 
the arrow 104 and make a new time interval start. If such an input does not have for n seconds 
(for example, 20 seconds), the arrow 108 will drive and the state 102 will be reached. In this state, 
a system performs a dynamic video cursor frame. Unless a user input is received, the arrow 110 
drives, and displaying a system Is continued as long as the video material which can be displayed 
may be obtained. However, if a user input is received, the arrow 106 will drive and a system will 
stop at the actual content position of a dynamic video cursor frame, or the starting position of a 
actual interval. 

Some of key-frames which use the content of detection of a key-frame and a filtering video 
program influenced by the "Inta racing" effect for carrying out a browsing may be extracted from 
the sequence which has a high motion. As for this, a video sequence is coded by Interlace coding 
mode like usual. When a frame consists of the two fields appointed as the composition of a perfect 
frame, an even line belongs to one field and an odd line belongs to the field of another side, a 
troublesome zigzag effect is generated. When this problem becomes much more remarkable in a 
small key-frame, and it becomes troublesome, and a picture is expanded in this case and a line 
becomes a thick block, this effect will stand out further. 

First, it is necessary to detect the key-frame influenced by such an Inta racing effect. This effect is 
observed on the line of a picture and generates a frequency value with a high luminance change. 
Using this point, spatial frequency spectrum is divided into many subbands, and only a high 
frequency component is taken into consideration, actually, the effect which it is going to detect 
should show a mutual luminance value between an even line and an odd line — it comes out, and it 
is, therefore this picture should have a high sampling frequency — It comes out. It Is necessary to 
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calculate only the coefficient which is a maximum frequency ingredient of the frequency conversion 
(FFT or preferably DCT) on a sequence. When a picture is influenced by a zigzag effect, it has a 
value with this high ingredient. 

However, this effect is in sight also corresponding to the object which has a motion, especially the 
object which moves horizontally and has an ingredient. Therefore, total of a coefficient should not 
be taken into consideration. This is because total of a coefficient generates a high value in the 
picture which has a pattern of details and high contrast and generates an error eventually. A much 
more good result can be obtained by dividing a picture into some small parts and taking the 
maximum/zone into consideration. For example, total is no longer highly influenced by the detailed 
picture by adding two peak prices of each zone. 

since low vertical definition is not more troublesome than a zigzag effect at the end, the easiest 
method of filtering this picture doubles the rise sampling of this perpendicularly only in 
consideration of one IRUDO. It is applicable before displaying the picture which was able to obtain 
the interpolation filter described with the next section. 

Therefore, detection and amendment are performed as follows. The 1st step performs a twice as 
many rise sampling as this, in order to throw away the one field and then to restore the original 
size of a key-frame by removing the half of a line, even lines, or odd lines, and it performs 
interpolation filter processing after that. In this case, an interpolation filter performs easy linear 
interpolation. 

In order to make it look easily from a rise sampling and interpolation TV observation distance, it is 
necessary to expand a key-frame to full-screen size mostly by the rise sampling which an 
Interpolation filter follows. Generally, since a key-frame has low resolution, it is necessary to 
expand it for high magnification. Since a pixel will become a big block if it processes further, this 
means that a result will not become legible. Therefore, although it is necessary to carry out filtering 
of the picture, since it is necessary to carry out high speed processing so that it may have short 
response time while the picture of good image quality is generated and it is necessary to make it 
display in high resolution, it is necessary to find out a trade-off. A problem is at the point that it is 
necessary to expand during the flight of a picture. 

That is, since the memory space needed in order to memorize to a hard disk becomes large too 
much, also at once, this means expansion and that filtering cannot be carried out for a picture, this 
— ** - it is necessary to make rise sampling and filtering processing high-speed as the result 
which can be permitted simultaneously is maintained and is inside possible Generally, The usual 
interpolation filter. It can be used (IEEE paper "Digital Interpolation of Discrete Image", arbitrary 
these :, for example, H.C.Andrews, about digital signal processing, of C.L.Patterson). 
Trans.Comput.196, v25, pp.196 -202 reference. 

Other art which raises image quality can also be used. Although especially the wavelet method and 
the fractal method draw a high calculation burden, in vision quality, they show a remarkable result. 

In practice, fractal compression art is publicly known, and can reconstruct or simulate a detail with 
high resolution by repeating the same decoding processing. In this case, let the picture to 
memorize be a fractal compression picture which generates a high compression ratio. Similarly, by 
using wavelet transform, a high frequency component can be predicted on a still higher scale, and 
high resolution images without the Japanese quince effect can be obtained. 
In video transmission of the text search present of the video program based on a subtitle , A 
subtitle is often transmitted together with a program (when it is many, in an analogue system, it is 
inserted within a vertical blanking period and inserted into each elementalist ream by digital 
transmission). This is used to the program usually distributed in a foreign language, or is used for a 
hearing-impaired person. Although such information is usually superimposed on a screen, it is also 
recordable on a storage. If it does in this way, at the sound of a program, and the time, description 
of the sound for a hearing-impaired person can be used for search processing. 
It is necessary to perform extraction of this kind of information in real time, recording a program. 
If this art is combined with a key-frame extraction routine, it is combinable with the dialog which 
produces a picture in a related text, i.e., the program part from which the key-frame was 
extracted. Thus, text browsing based on a specific keyword can be performed using the present 
text browsing art. The easy inquiry based on a keyword can be performed as the specific tool of 
application is commonly used in tiie "Web" search engine now. 
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The news program should be recorded as an example. If a keyword "France" is inserted to search 
the news about France, a system will look for this word in the text of a program automatically. A 
user is shown the key-frame about the portion of a program and the particular part of a subtitle by 
which the keyword was found out if a result is yes. Subsequently, the user can watch a program 
from a specific point. Since it is displayed on the pars basilaris ossis occipitalis of a screen as ail 
these key-frames show drawing 5 when many key-frames are detected as a result of an inquiry, 
the user can analyze a related text on a window big one at a time . When a result is no, a keyword 
(France, Paris) of the same kind can be used. This system is useful also although the report which 
covers a specific team or a specific sport in a sports program is searched. 
Many of other uses are possible, for example, it can be used for the inspection of being what a 
child may see in a movie, and the inspection of whether the language currently used for 
conversation is contained in the list of an "immoral language." 

Extracting from the still picture of a screen, for example by OCR art, when - text is not obtained 
from video by dissociating as possible extension of such a system, extracting conversation from a 
program using - speech recognition technology, in this case, when it operates always 

regardless of the service provided by the broadcast contractor and the subtitle is not provided, a 
system makes a system learn and can make it possible to always perform text browsing based on 
some keywords at least. 



Drawing 1 

[x] ID=000004 



Drawing 2 
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